17 research outputs found

    A framework for automatic and parameterizable memoization

    Get PDF
    International audienceImproving execution time and energy efficiency is needed for many applications and usually requires sophisticated code transformations and compiler optimizations. One of the optimization techniques is memoization, which saves the results of computations so that future computations with the same inputs can be avoided. In this article we present a framework that automatically applies memoization techniques to C/C++ applications. The framework is based on automatic code transformations using a source-to-source compiler and on a memoization library. With the framework users can select functions to memoize as long as they obey to certain restrictions imposed by our current memoization library. We show the use of the framework and associated memoization technique and the impact on reducing the execution time and energy consumption of four representative benchmarks

    LALP: a language to program custom FPGA-based acceleration engines

    No full text
    Field-Programmable Gate Arrays (FPGAs) are becoming increasingly important in embedded and high-performance computing systems. They allow performance levels close to the ones obtained with Application-Specific Integrated Circuits, while still keeping design and implementation flexibility. However, to efficiently program FPGAs, one needs the expertise of hardware developers in order to master hardware description languages (HDLs) such as VHDL or Verilog. Attempts to furnish a high-level compilation flow (e.g., from C programs) still have to address open issues before broader efficient results can be obtained. Bearing in mind an FPGA available resources, it has been developed LALP (Language for Aggressive Loop Pipelining), a novel language to program FPGA-based accelerators, and its compilation framework, including mapping capabilities. The main ideas behind LALP are to provide a higher abstraction level than HDLs, to exploit the intrinsic parallelism of hardware resources, and to allow the programmer to control execution stages whenever the compiler techniques are unable to generate efficient implementations. Those features are particularly useful to implement loop pipelining, a well regarded technique used to accelerate computations in several application domains. This paper describes LALP, and shows how it can be used to achieve high-performance computing solutions.CNPq/GricesFAPESP [573963/2008-8, 08/57870-9]FCT, Portugal [PTDC/EEA-ELC/70272/2006]CNP

    Fast placement and routing by extending coarse-grained reconfigurable arrays with Omega Networks.

    No full text
    Reconfigurable computing architectures are commonly used for accelerating applications and/or for achieving energy savings. However, most reconfigurable computing architectures suffer from computationally demanding placement and routing (P&R) steps. This problem may disable their use in systems requiring dynamic compilation (e.g., to guarantee application portability in embedded systems). Bearing in mind the simplification of P&R steps, this paper presents and analyzes a coarse-grained reconfigurable array (CGRA) extended with global multistage interconnect networks, specifically Omega Networks. We show that integrating one or two Omega Networks in a CGRA permits to simplify the P&R stage resulting in both low hardware resource overhead and low performance degradation (18% for an 8 _ 8 array). We compare the proposed CGRA, which integrates one or two Omega Networks, with a CGRA based on a grid of processing elements with reach neighbor interconnections and with a torus topology. The execution time needed to perform the P&R stage for the two array architectures shows that the array using two Omega Networks needs a far simpler and faster P&R. The P&R stage in our approach completed on average in about 16_ less time for the 17 benchmarks used. Similar fast approaches needed CGRAs with more complex interconnect resources in order to allow most of the benchmarks used to be successfully placed and routed

    Fast placement and routing by extending coarse-grained reconfigurable arrays with Omega Networks.

    No full text
    Reconfigurable computing architectures are commonly used for accelerating applications and/or for achieving energy savings. However, most reconfigurable computing architectures suffer from computationally demanding placement and routing (P&R) steps. This problem may disable their use in systems requiring dynamic compilation (e.g., to guarantee application portability in embedded systems). Bearing in mind the simplification of P&R steps, this paper presents and analyzes a coarse-grained reconfigurable array (CGRA) extended with global multistage interconnect networks, specifically Omega Networks. We show that integrating one or two Omega Networks in a CGRA permits to simplify the P&R stage resulting in both low hardware resource overhead and low performance degradation (18% for an 8 _ 8 array). We compare the proposed CGRA, which integrates one or two Omega Networks, with a CGRA based on a grid of processing elements with reach neighbor interconnections and with a torus topology. The execution time needed to perform the P&R stage for the two array architectures shows that the array using two Omega Networks needs a far simpler and faster P&R. The P&R stage in our approach completed on average in about 16_ less time for the 17 benchmarks used. Similar fast approaches needed CGRAs with more complex interconnect resources in order to allow most of the benchmarks used to be successfully placed and routed

    A Method for Detecting Pathologies in Concrete Structures Using Deep Neural Networks

    No full text
    Pathologies in concrete structures, such as cracks, splintering, efflorescence, corrosion spots, and exposed steel bars, can be visually evidenced on the concrete surface. This paper proposes a method for automatically detecting these pathologies from images of the concrete structure. The proposed method uses deep neural networks to detect pathologies in these images. This method results in time savings and error reduction. The paper presents results in detecting the pathologies from wide-angle images containing the overall structure and also for the specific pathology identification task for cropped images of the region of the pathology. Identifying pathologies in cropped images, the classification task could be performed with 99.4% accuracy using cross-validation and classifying cracks. Wide images containing no, one, or several pathologies in the same image, the case of pathology detection, could be analyzed with the YOLO network to identify five pathology classes. The results for detection with YOLO were measured with mAP, mean Average Precision, for five classes of concrete pathology, reaching 11.80% for fissure, 19.22% for fragmentation, 5.62% for efflorescence, 27.24% for exposed bar, and 24.44% for corrosion. Pathology identification in concrete photos can be optimized using deep learning

    São Paulo e os sentidos da colonização

    No full text
    corecore